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LONG-TERM  GOALS 

This  one-year  effort  will  focus  on  the  transition  of  FERI’s  machine  learning  algorithms  for 
HyperSpectral  Imagery  (FISI)  in  the  VSW  into  a  distributable  code  set.  This  will  provide  a  stable  code 
platform  for  the  application  and  transition  of  machine  learning-based  hyperspectral  classification 
techniques  into  6. 3/6.4  programs. 

OBJECTIVES 

Our  objective  is  to  focus  on  three  areas  of  application  research  and  transitions.  First,  we  will  transition 
our  machine  learning-based  algorithms  and  computer  code  for  the  determination  of  bathymetry, 
bottom  type,  and  water  column  Inherent  Optical  Properties  from  FlyperSpectral  Imagery  (FISI)  into  a 
deliverable  Message  Passing  Interface  (MPI)  program  that  may  be  easily  used  by  other  research  and 
military  operators.  Second,  we  will  use  this  program  to  determine  the  impacts  of  the  granularity  of  the 
classification  database  on  the  inversion  bathymetry,  bottom  type,  and  IOPs.  Third,  we  will  move 
beyond  the  use  of  single  pixel  HSI  inversion  to  the  use  of  spatial  context-filtering  to  remove  pixel-to- 
pixel  noise  inherent  in  the  HSI  data. 

APPROACH 

Task  1 

In  previous  works,  a  Look-Up  Table  (LUT)  algorithm  was  used  in  accurately  predicting  bathymetry 
(Mobley  et  al.  2002,  Bissett  et  al.  2004,  Bissett  et  al.  2005,  Mobley  et  al.  2005,  Lesser  and  Mobley, 
2008).  The  LUT  approach  is  a  subset  of  a  larger  body  of  artificial  intelligence  work  concerned  with 
algorithms  and  techniques  that  “teach”  machine  to  leam  from  the  examination  of  data  and  rules.  This 
body  of  work  is  aptly  called  “machine  learning”  and  some  of  its  techniques  include  decision  trees, 
genetic  algorithms,  and  neural  networks.  The  LUT  approach  is  a  subset  of  the  k-Nearest  Neighbor 
(kNN)  algorithm,  which  is  in  the  family  of  supervised  learning  algorithms. 
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Our  use  of  the  kNN  algorithm  maps  a  single  HSI  remote  sensing  reflectance  vector,  Rrs(k),  onto  a 
database  of  estimated  Rrs(k).  This  database  is  created  by  providing  the  attributes  of  bathymetry, 
spectral  bottom  reflectance,  and  spectral  IOPs  to  the  radiative  transfer  routines  of  Ecolight  (which  is  a 
high  speed  variant  of  Hydrolight,  Mobley,  1994).  We  select  the  classification  of  the  measured  Rrs 
vector  based  on  the  best  match  of  measured  Rrs(k)  to  estimated  Rrs(k).  The  LUT  algorithm  is  based 
on  a  single  best  fit  for  our  classification,  i.e.  k  =  1.  However,  more  recent  work  suggested  that  we 
could  achieve  a  better  classification  by  selecting  a  larger  number  for  k,  e.g.  k  =  50  (Bissett  et  al. 

2006a).  This  larger  number  for  k  provides  better  accuracy  and  precision,  as  well  as  provides  us  with 
the  ability  to  create  confidence  intervals  for  our  classifications  of  bathymetry. 

When  classifying  new  spectra,  the  distance  or  angle  between  each  measured  spectrum  and  estimated 
spectrum  in  the  database  is  calculated.  The  k  nearest  neighbors  to  that  spectra  (those  having  the 
smallest  distances  or  angles),  are  considered  sufficiently  qualified  to  predict  the  corresponding 
attributes  of  bathymetry,  bottom  type,  and  IOP  set.  We  have  used  the  following  metrics  for  the 
calculation  of  distance  (Euclidean,  Manhattan,  Chebyshev,  Canberra  and  Bray  Curtis)  and/or  angle 
(Angular  Separation  and  Correlation  Coefficient).  In  general,  our  applications  suggest  that  the 
Manhattan  distance  and  the  Correlation  Coefficient  angle  metrics  to  be  the  best  metrics  to  use  for  this 
algorithm.  Once  the  set  of  nearest  neighbors  are  determined,  the  attribute  (e.g.  bathymetry)  of  a  pixel 
may  be  determined  by  a  majority  vote  from  the  k  nearest  neighbor  vectors.  In  the  event  of  a  tie,  a 
prediction  is  made  randomly  from  amongst  the  majority  classes. 

The  computer  code  used  in  our  creation  of  the  estimated  Rrs(k)  database  and  the  spectral  matching  of 
the  measured  versus  estimated  Rrs(k)  is  functional  for  scientific  research;  however  it  not  well 
developed  for  transition  for  use  by  others  in  testing  and  evaluation  applications. 

The  tasks  of  this  project  are  as  follows: 

1)  We  will  build  upon  our  past  research  efforts  to  provide  a  Message  Passing  Interface 
(MPI)  executable  version  of  our  kNN  workbench  for  the  inversion  of  hyperspectral  imagery. 

2)  The  code  from  Task  1  to  rapidly  test  the  impacts  of  granularity  of  attribute  selection  on 
the  accuracy  and  precision  of  bathymetry  estimated  from  our  kNN  code  and  the  HSI  data  from 
Horseshoe  Reef  (Bissett  et  al.  2006b). 

3)  We  will  evaluate  two  types  of  context-filtering  -  (1)  pre-filtering  of  the  Rrs(k)  spectra 
before  classification,  and  (2)  context-filtering  of  the  retrieved  attributes  after  classification. 

This  year’s  work  focused  on  Task  3  -  context-filtering.  The  first  type  of  context-filtering  seeks  to 
reduce  the  noise  in  Rrs(k)  spectra  by  replacing  the  spectrum  value  at  each  wavelength  with  the  median 
value  of  the  spectra  in  a  spatial  area  surrounding  the  pixel  of  interest,  say  a  3  x  3  grid  of  pixels  centered 
on  the  one  of  interest.  This  spatial  filter  is  applied  wavelength  by  wavelength.  At  wavelengths  where 
Rrs(k)  is  mostly  signal,  the  final  spectrum  will  not  change  by  much.  At  wavelengths  where  Rrs(k)  is 
noisy,  the  noise  in  the  surrounding  pixels  will  tend  to  average  out  and  the  final  spectrum  values  over 
the  entire  image  area  will  be  less  noisy  that  the  original. 

The  second  type  of  context-filtering  involves  post-processing  the  retrievals  themselves,  rather  than  the 
original  image  spectra.  In  the  case  of  real  numbered  attributes,  such  as  bathymetry,  we  can  apply  a 
median  filter  to  the  retrieved  depth.  For  bottom  type  and  IOP  set,  the  way  forward  is  less  clear.  Each 
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of  these  attributes  is  assigned  a  type  with  a  specific  vector  (or  set  of  vectors  in  the  case  of  IOPs)  of 
spectral  values.  How  we  filter  “Dark  Sediment”  with  “Sparse  Vegetation”  or  “Highly  absorbing  and 
scattering  waters  #1”  with  “Case  1,  chlorophyll  a  =  0.5  mg  m"3”  will  be  a  challenge.  It  may  require 
some  iterative  solution  that  context-filters  bathymetry  first,  and  solves  the  kNN  again  using  a 
constrained  bathymetry  solution  approach.  It  may  also  be  highly  dependent  on  the  granularity  study  in 
Task  2.  These  are  the  issues  that  we  will  address  in  this  Task. 

WORK  COMPLETED 

Task  3  starts  with  a  baseline  set  of  statistics  with  which  to  compare  our  spectral  matching  approaches 
to  the  “true”  bathymetry  measured  with  acoustical  techniques.  In  addition  to  previously  used  estimates 
(see  below),  we  include  a  new  estimation  of  “spikiness”  in  the  retrieval  of  bathymetry  from  our 
spectrum  matching  techniques.  Spikiness,  S,  is  defined  in  the  depth  estimates  as  follows.  For  a  given 
pixel  (i,j)  with  retrieved  depth  z(i,j),  the  average  depth  of  the  4  neighboring  pixels  is 

zavg4  =  0.25  [z(i- 1 ,  j)  +  z(i+l,  j)  +  z(i,  j-1)  +  z(i,  j+1)]. 

Spikiness,  S(i,j),  of  the  retrieved  depth  at  (i,j)  as  the  absolute  percent  difference  in  depth  z(i,j)  and 
zavg4, 


S(i,j)  =  100  { |z(i,j)  -  zavg4|}  over  {zavg4} 

For  example  at  a  kNN=l  (a  single  value  LUT  retrieval),  if  retrieval  z(i,j)  =  5  m  or  15  m,  and  zavg4  = 

10  m,  then  S(i,j)  =  50%.  Note  that  a  linearly  sloping  bottom  is  the  same  as  a  level  bottom  as  regards 
the  value  of  zavg4.  Thus  a  change  in  depth  from  one  pixel  to  the  next  because  of  a  sloping  bottom  is 
not  recorded  as  spikiness.  This  metric  is  best  suited  for  detecting  a  single  spiky  pixel.  However,  if  a 
group  of  pixels  is  spiky,  then  some  of  the  spiky  pixels  may  be  included  in  the  zavg4  value,  and  the  true 
spikiness  may  be  underestimated  for  pixel  (i,j).  Likewise,  a  sharp  change  in  bottom  depth,  e.g.,  due  to 
a  coral  head,  may  be  recorded  as  a  depth  spike  even  though  the  LUT  retrieval  is  correct. 

Other  statistical  measures  for  “goodness  of  fit”  from  previous  efforts  include  - 

1 .  The  average  percent  difference  in  LUT  vs  acoustic  depths  (a  negative/positive  value  means 
that  the  LUT  depths  are  on  average  shallower/deeper  than  the  acoustic  depths) 

2.  The  average  difference  in  meters  in  LUT  vs  acoustic  depths  (a  negative/positive  value  means 
that  the  LUT  depths  are  on  average  shallower/deeper  than  the  acoustic  depths) 

3.  The  standard  deviation  in  meters  of  the  LUT  vs  acoustic  depths 

4.  The  correlation  coefficient,  r  '  between  the  LUT  and  acoustic  depths 

5.  The  percent  of  pixels  for  which  the  LUT  depth  is  within  ±lm  of  the  correct  depth 

6.  The  percent  of  pixels  for  which  the  LUT  depth  is  within  ±25%  of  the  correct  depth 

The  baseline  for  our  comparison  of  various  selections  of  spatial  filtering  parameters  and  kNN 
parameters  is  seen  in  Figures  1 .  This  figure  show  the  bathymetry  retrievals  for  unfiltered,  kNN  =  1 
(LUT),  parameters  of  our  spectrum  matching  algorithms.  In  summary,  we  now  have  six  quantitative 
measures  of  the  overall  accuracy  of  depth  retrievals  and  two  measures  of  the  spikiness  of  depth 
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retrievals.  These  metrics  are  used  below  to  compare  the  effects  of  spatial  smoothing  of  input  Rrs 
spectra,  of  spatial  smoothing  of  retrieved  depths,  and  of  the  type  of  kNN  analysis. 
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Figure  1.  A  2D  plot  of  retrieved  depths,  with  the  actual  LUT-retrieved  depths  binned  into  2-m  bins. 
Even  with  the  binning,  there  is  noticeable  speckle  in  the  deeper  waters  at  the  upper  right. 


We  created  a  matrix  of  combinations  between  for  testing  kNN,  Rrs,  and  depth  averaging  yield  a  3  x  3  x 
3  solution  matrix  of  27  different  combinations  for  analysis.  The  following  list  provides  a  brief 
summary  of  the  results. 

1 .  kNN  analysis  does  not  help  if  the  input  Rrs  spectrum  is  bad 

2.  Using  the  median  of  k  =  30  depths  gives  slightly  better  signed  depth  errors  than  does  the 
average  of  30  depths 

3.  Using  the  average  of  k  =  30  depths  gives  somewhat  less  spikiness  (smaller  average  S  values, 
and  fewer  pixels  with  S  >  25%)  that  does  the  median  of  k  =  30  values 

4.  Other  goodness-of-fit  metrics  are  about  the  same  for  the  average  and  median  of  k  =  30  values 

5.  The  average  and  median  of  k  =  30  values  give  smaller  signed  depth  errors  (-0.8  to  -2%)  than 
does  k  =  1  (-7.0  to  -7.4%),  regardless  of  what  smoothing  is  applied 

6.  The  k  =  1  depths  give  a  smaller  standard  deviation  of  the  LUT  vs  acoustic  depth  errors  than 
does  either  the  average  or  median  of  k  =  30 

7.  smoothing  of  the  retrieved  depths  reduces  spikiness  much  more  that  does  a  corresponding 
(having  the  same  value  of  n)  smoothing  of  the  Rrs 

8.  The  average  of  k  =  30  values  reduces  both  average  and  extreme  spikiness  more  than  does  the 
median 

These  results  are  very  encouraging  when  compared  to  our  baseline  retrievals  (Figures  4-7). 

However,  there  is  no  single  “best”  methodology  that  gives  superior  values  for  all  error  metrics. 
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Nevertheless,  it  appears  that  a  reasonable  recommendation  (at  least  for  the  Horseshoe  Reef  image)  is 
to: 

1 .  use  the  median  of  k  =  30  values  to  estimate  the  depth  at  each  pixel  (although  using  the  average 
of  k  =  30  is  about  the  same),  which  will  give  the  most  accurate  average  signed  depth  retrievals 

2.  definitely  perform  3X3  or  5X5  spatial  smoothing  of  the  retrieved  depths,  which  will  greatly 
reduce  the  spikiness  and  thus  further  decrease  the  depth  errors 

3.  optionally  also  perform  3X3  or  5X5  spatial  smoothing  of  the  Rrs  spectra  before  doing  the  LUT 
matching  (Figure  2-3) 
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Figure  2.  A  2D  plot  of  retrieved  depths,  with  the  actual  kNN-retrieved  depths 
binned  into  2-m  bins  and  color-coded. 
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Figure  3.  Goodness-of-fit  results  from  kNN  vs.  acoustic  depths  for  optional  retrieval. 


IMPACT/APPLICATIONS 

This  effort  will  deliver  an  application  for  testing  and  evaluating  of  our  machine  learning  approaches  to 
bathymetry  estimation  in  Very  Shallow  Waters  (VSW).  While  it  is  being  demonstrated  on 
hyperspectral  imagery,  the  techniques  and  computer  code  may  be  used  with  any  set  of  spectral 
reflectance  data.  As  such  the  deliverables  from  this  effort  will  allow  other  to  create  maps  of  depths, 
bottom  types,  and  water  clarity  from  a  variety  of  airborne  and  space-based  spectral  sensors  planned  for 
operational  deployment. 

RELATED  PROJECTS 

This  work  is  being  conducted  in  conjunction  with  Dr.  Curtis  D.  Mobley  at  Sequoia  Scientific,  Inc., 
who  is  funded  under  this  effort  for  the  collaboration  as  well  as  under  other  collaborative  spectrum 
matching  funding.  These  techniques  developed  here  are  now  being  applied  to  imagery  of  Australian 
coastal  waters  in  a  comparison  of  several  different  hyperspectral  remote  sensing  algorithms  for  a 
variety  of  environments.  That  comparison  study  is  being  led  by  A.  Dekker  of  CSIRO.  The  kNN 
algorithms  developed  under  this  grant  are  being  transition  within  an  application  appliance  to  be 
delivered  to  Naval  Oceanographic  Office  (N00014-09-C-0553)  and  is  be  delivered  October  2009. 
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